A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

نویسندگان

  • Adina Williams
  • Nikita Nangia
  • Samuel R. Bowman
چکیده

This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest corpora available for the task of NLI, at 433k examples, this corpus improves upon available resources in its coverage: it offers data from ten distinct genres of written and spoken English—making it possible to evaluate systems on nearly the full complexity of the language—and it offers an explicit setting for the evaluation of crossgenre domain adaptation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data-driven design of a sentence list for an articulatory speech corpus

Articulatory data offers promising developments in our understanding of speech production and advances in speech technologies. However, it is more expensive and difficult to obtain than audio data, which means data collection must be carefully planned. This paper presents a method for designing an articulatory speech corpus comparable to the widely-used TIMIT corpus, for languages other than En...

متن کامل

Understanding Mental States in Natural Language

Understanding mental states in narratives is an important aspect of human language comprehension. By “mental states” we refer to beliefs, states of knowledge, points of view, and suppositions, all of which may change over time. In this paper, we propose an approach for automatically extracting and understanding multiple mental states in stories. Our model consists of two parts: (1) a parser tha...

متن کامل

Beauty and the Beast: What running a broad-coverage precision grammar over the BNC taught us about the grammar — and the corpus

Introduction Typically, broad-coverage precision grammars are based on grammaticality judgment data and syntactic intuition, and corpus data is relegated to secondary status in guiding lexicon and grammar development. On the other end of the scale, shallow grammars are often induced directly from treebank data and make little or no use of grammaticality judgments or intuition. This tends to cau...

متن کامل

A Model of Language Processing as Hierarchic Sequential Prediction

Computational models of memory are often expressed as hierarchic sequence models, but the hierarchies in these models are typically fairly shallow, reflecting the tendency for memories of superordinate sequence states to become increasingly conflated. This article describes a broad-coverage probabilistic sentence processing model that uses a variant of a left-corner parsing strategy to flatten ...

متن کامل

An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator

This paper describes a general-purpose sentence generation system that can achieve both broad scale coverage and high quality while aiming to be suitable for a variety of generation tasks. We measure the coverage and correctness empirically using a section of the Penn Treebank corpus as a test set. We also describe novel features that help make the generator flexible and easier to use for a var...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1704.05426  شماره 

صفحات  -

تاریخ انتشار 2017